Conscious Radix Sort
نویسنده
چکیده
The exploitation of data locality in parallel computers is paramount to reduce the memory traac and communication among processing nodes. We focus on the exploitation of locality by Parallel Radix sort. The original Parallel Radix sort has several communication steps in which one sorting key may have to visit several processing nodes. In response to this, we propose a reorganization of Radix sort that leads to a highly local version of the algorithm at a very low cost. As a key feature, our algorithm performs one only communication step, forcing keys to move at most once between processing nodes. Also, our algorithm reduces the amount of data communicated. Finally , the new algorithm achieves a good load balance which makes it insensitive to skewed data distributions. We call the new version of Parallel Radix sort that combines locality and load balance, Communication and Cache Conscious Radix sort (C 3-Radix sort). Our results on 16 processors of the SGI O2000 show that C 3-Radix sort reduces the execution time of the previous fastest version of Parallel Radix sort by 3 times for data sets larger than 8M keys and by almost 2 times for smaller data sets. 1 Introduction Large scale parallel computers include complex memory hierarchies and communication networks. With the design of software that exploits data locality, both the memory hierarchy and the communication network beneet from lower data traac. We address the important problem of locality exploitation for Parallel Radix sort. Radix sort is the most compet
منابع مشابه
Sorting on the SGI Origin 2000: Comparing MPI and Shared Memory Implementations
In this paper we analyse the Communication and Cache Conscious Radix sort algorithm, C-Radix, using the distributed and the shared memory parallel programming models. C-Radix was originally proposed based on the idea of the classic Radix sort to exploit the memory hierarchy locality and reduce the amount of communication for distributed memory computers. Here, we implement C-Radix on the SGI Or...
متن کاملCache-Conscious Radix-Decluster Projections
As CPUs become more powerful with Moore’s law and memory latencies stay constant, the impact of the memory access performance bottleneck continues to grow on relational operators like join, which can exhibit random access on a memory region larger than the hardware caches. While cache-conscious variants for various relational algorithms have been described, previous work has mostly ignored (the...
متن کاملThe Effect of Local Sort on Parallel Sorting Algorithms
We show the importance of sequential sorting in the context of in memory parallel sorting of large data sets of 64 bit keys. First, we analyze several sequential strategies like Straight Insertion, Quick sort, Radix sort and CC-Radix sort. As a consequence of the analysis, we propose a new algorithm that we call Sequential Counting Split Radix sort, SCS-Radix sort. SCS-Radix sort is a combinati...
متن کاملModified Pure Radix Sort for Large Heterogeneous Data Set
We have proposed a Modified Pure Radix Sort for Large Heterogeneous Data Set. In this research paper we discuss the problems of radix sort, brief study of previous works of radix sort & present new modified pure radix sort algorithm for large heterogeneous data set. We try to optimize all related problems of radix sort through this algorithm. This algorithm works on the Technology of Distribute...
متن کامل‘Review of Radix Sort & Proposed Modified Radix Sort for Heterogeneous Data Set in Distributed Computing Environment’
We have proposed a Modified Pure Radix Sort for Large Heterogeneous Data Set. In this research paper we discuss the problems of radix sort, brief study of previous works of radix sort & present new modified pure radix sort algorithm for large heterogeneous data set. We try to optimize all related problems of radix sort through this algorithm. This algorithm works on the Technology of Distribute...
متن کامل